Towards a Bayesian Perspective on Statistical Disclosure Limitation
نویسنده
چکیده
National statistical offices and other organizations collect data on individual subjects (persons, businesses, organizations), while typically assuring the subject that data pertaining to them will be held confidential. These data provide the raw material for the statistical data products (tabular summaries, microdata files comprised of data records pertaining to individual subjects, and, potentially, public statistical data bases and statistical query systems) that the statistical office disseminates to multiple, broad user communities. Statistical disclosure limitation (SDL) refers to the problem and methods for thwarting re-identification of a subject and divulging the subject’s confidential data through analysis or manipulation of disseminated data products. SDL methods abbreviate or modify the data product sufficiently to thwart disclosure. SDL problems are typically computationally demanding; several have been shown to be NP-hard. Many SDL methods draw upon statistical, mathematical or optimization theory, but at the same time heuristic and partial approaches abound. Contributions from Bayesian and likelihood perspectives are increasing. Nevertheless, a strong theoretical connection between definitions of statistical disclosure, measurement of disclosure risk, and evaluation of SDL methods is lacking. This suggests opportunities for Bayesian, likelihood and hierarchical approaches. Selected opportunities and associated SDL methodological issues are discussed.
منابع مشابه
Assessing the Risk of Disclosure of Confidential Categorical Data
Disclosure limitation involves the application of statistical tools to limit the identification of information on individuals (and enterprises) included as part of statistical data bases such as censuses and sample surveys. We outline the major issues involved in assessing disclosure risk and assuring the protection of confidentiality for data bases, especially those in the form of multi-way co...
متن کاملPrivacy and Statistical Risk: Formalisms and Minimax Bounds
We explore and compare a variety of definitions for privacy and disclosure limitation in statistical estimation and data analysis, including (approximate) differential privacy, testingbased definitions of privacy, and posterior guarantees on disclosure risk. We give equivalence results between the definitions, shedding light on the relationships between different formalisms for privacy. We also...
متن کاملTowards Providing Automated Feedback on the Quality of Inferences from Synthetic Datasets
Many national statistical agencies release data to the public that have been altered to protect the confidentiality of data subjects’ identities and sensitive attributes. Unfortunately, for methods of disclosure limitation in practice, it is typically impossible for analysts to gauge how the disclosure limitation has compromised the quality of inferences from the altered data alone. This is par...
متن کاملAdditive noise and multiplicative bias as disclosure limitation techniques for continuous microdata: A simulation study
This paper focuses on a combination of two disclosure limitation techniques, additive noise and multiplicative bias, and studies their efficacy in protecting confidentiality of continuous microdata. A Bayesian intruder model is extensively simulated in order to assess the performance of these disclosure limitation techniques as a function of key parameters like the variability amongst profiles ...
متن کاملIntruder Testing on the 2011 UK Census: Providing Practical Evidence for Disclosure Protection
With the recent push towards sharing greater amounts of information, the pressure is on National Statistical Institutes (NSIs) to publish more detailed datasets to broader audiences. It is of parallel importance for any such organisation to respect and protect the confidentiality of respondents’ data. Assessing the risk of identification in a dataset is a challenging task and there is much in t...
متن کامل